Toponym Disambiguation in English-Lithuanian SMT System with Spatial Knowledge

نویسندگان

  • Raivis Skadins
  • Tatiana Gornostay
  • Valters Sics
چکیده

This paper presents an innovative research resulting in the English-Lithuanian statistical factored phrase-based machine translation system with a spatial ontology. The system is based on the Moses toolkit and is enriched with semantic knowledge inferred from the spatial ontology. The ontology was developed on the basis of the GeoNames database (more than 15 000 toponyms), implemented in the web ontology language (OWL), and integrated into the machine translation process. Spatial knowledge was added as an additional factor in the statistical translation model and used for toponym disambiguation during machine translation. The implemented machine translation approach was evaluated against the baseline system without spatial knowledge. A multifaceted evaluation strategy including automatic metrics, human evaluation and linguistic analysis, was implemented to perform evaluation experiments. The results of the evaluation have shown a slight improvement in the output quality of machine translation with spatial knowledge.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving SMT with Morphology Knowledge for Baltic Languages

In the recent years, several machine translation systems have been built for the Baltic languages. Besides Google and Microsoft machine translation engines and research experiments with statistical MT for Latvian [1] and Lithuanian, there are both English-Latvian [2] and English-Lithuanian [3] rulebased MT systems available. Both Latvian and Lithuanian are morphologically rich languages with qu...

متن کامل

Resolving fine granularity toponyms: Evaluation of a disambiguation approach

Landscape descriptions in natural language, for instance from historic corpora, are a complementary source to empirical ethnographic work, for example to research exploring variation in the use of basic levels or basic terms within landscapes across localities (c.f. Mark and Turk 2003, Burenhult and Levinson 2008, Turk et al. 2011), on the condition that such descriptions can be linked to space...

متن کامل

Data Pre-Processing to Train a Better Lithuanian-English MT System

Pried -as ir Protokol -as yr -a neatskiriam -a ši -o Susitar -imo dal -is. Prefixes separated, endings replaced by tense and number feature values System #2 Prefixes separated, all endings replaced by number feature values and verb endings also by time feature values System #3 Prefixes separated, endings deleted System #4 As Lithuanian is highly inflected language, the words change the form acc...

متن کامل

English-Lithuanian-English Machine Translation lexicon and engine: current state and future work

ENGLISH-LITHUANIAN-ENGLISH MACHINE TRANSLATION LEXICON AND ENGINE: CURRENT STATE AND FUTURE WORK Gintaras Barisevi ius, Bronius Tamulynas Kaunas University of Technology This article overviews the current state of the English-Lithuanian-English machine translation system. The first part of the article describes the problems that system poses today and what actions will be taken to solve them in...

متن کامل

Geo-referencing Place from Everyday Natural Language Descriptions

Natural language place descriptions in everyday communication provide a rich source of spatial knowledge about places. An important step to utilize such knowledge in information systems is geo-referencing all the places referred to in these descriptions. Current techniques for geo-referencing places from text documents are using place name recognition and disambiguation; however, place descript...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011